move the LLM instance directly to Assistant to make it cleaner to share with tests#71
move the LLM instance directly to Assistant to make it cleaner to share with tests#71
Conversation
You did this by changing to 5.2 rather than 4.1-mini, I'm confused what you mean here, since the judge model was already discriminated from the agent model, and if anything you don't want a heavy judge model that will slow down every single test since IIRC we don't run tests in parallel by default. |
@u9g Ah I got confused because I read the original PR into the python example and it used the same model in both places, whereas the original node PR had separate ones. I didn't look at the current state of the code before asking Claude to do it so I missed that it already had separated it which is good I changed it back to 4.1-mini for now (will do so in node too) but I'll say that I'm not really sure whether it makes sense to eval with a small model. this is something we should probably develop some benchmarks around. on a first-principles basis, I think I'd prefer to trust a larger model for evals than the one used in the actual conversation. in conversation, the focus needs to be on latency. outside of conversation, I am willing to spend longer really determining whether the conversation was good. this is maybe more of a concern with real session evals than with in-codebase synthetic evals like we have here. |
Yeah, as with anything it makes sense to benchmark. If we are changing to a bigger model for judge, it might be worth considering parallel testing in the templates. |
this gets rid of the awkward AGENT_MODEL constant by just making the LLM an inherent property of the Assistant, which seems more intuitive
I also switched the test judge model to base 5.2 instead of the chat version to make it clearer that you can (and should) use a different model for evals than for core chat.
I'd like to move STT and TTS as well, but I found a bug we need to fix first
also see livekit-examples/agent-starter-node#43